TutoRial - Part 2

Marine Ecosystem Dynamics

Authors

Kinlan Jan

Baptiste Serandour

The goal of this tutorial is to be familiar with the tidyverse, especially with tidyr, dplyr, and ggplot2 and to learn how to handle our datasets.

Pipes

Pipes are very useful and powerful to let the data flow from one function to another. This was first implemented in the magrittr packages with pipes that looks like this: %>%. This was so powerful, that in the versio 4.1.0 of R native pipes operator were launched and they look like this: |>.

Tips
  1. The easiest way to write the pipe is by using the keys: ⌘/Ctlr + ⇧ + M.
  2. In the parameters you can choose if you want to use the native pipe operator when using these keys.

Exercises

  • Rewrite this chunks of code using the pipes
sum(c(1,2,3,4,5))
c(1,2,3,4,5) |> 
  sum()
round(mean(seq(from = 0, to = 1.5, by = 0.2)), 2)
seq(from = 0, to = 1.5, by = 0.2) |> 
  mean() |> 
  round(2)
plot(sample(rnorm(10000, 20, 10), 200, replace = TRUE),y = sample(1:20, 200, replace = TRUE))
rnorm(10000, 20, 10) |> 
  sample(200, replace = TRUE) |> 
  plot(y = sample(1:20, 200, replace = TRUE))

Tidy the data with tidyr

As seen during the lecture, the concept of a tidy table is that:

  1. Each variable is in its own column
  2. Each observation is in its own row

To reach the tidy concept, we can use 4 key functions:

  1. pivot_longer - it transforms a wide dataset into a long dataset
  2. pivot_wider - it transforms a long dataset into a wide dataset
  3. unite - it unifies 2 columns into 1
  4. separate - it separates 1 columns into 2

Exercises

From this point onwards, we will use the datasets available in the package PlanktonData. All the raw data come from SHARKweb.

  • Install devtools and then PlanktonData.
install.packages("devtools")
devtools::install_github("KMGJan/PlanktonData")
  • Load the dataset zooplankton in your environment
Month_abb Year Station Coordinates Group Taxa Biomass
Jan 2009 BY15 20.05000/57.33333 Copepoda Acartia 6.6503187
Jan 2009 BY31 18.23333/58.58812 Copepoda Acartia 1.8169941
Jan 2009 BY5 15.98333/55.25000 Copepoda Acartia 5.5620974
Jan 2009 BY15 20.05000/57.33333 Copepoda Centropages 5.7385615
Jan 2009 BY31 18.23333/58.58812 Copepoda Centropages 1.2287586
Jan 2009 BY5 15.98333/55.25000 Copepoda Centropages 14.4052240
Jan 2009 BY15 20.05000/57.33333 Copepoda Pseudocalanus 10.5228820
Jan 2009 BY31 18.23333/58.58812 Copepoda Pseudocalanus 5.6339840
Jan 2009 BY5 15.98333/55.25000 Copepoda Pseudocalanus 21.5947750
Jan 2009 BY15 20.05000/57.33333 Copepoda Temora 9.7254882
Jan 2009 BY31 18.23333/58.58812 Copepoda Temora 4.9934649
Jan 2009 BY5 15.98333/55.25000 Copepoda Temora 45.7385290
Jan 2009 BY15 20.05000/57.33333 Rotatoria Synchaeta 0.3921570
Jan 2009 BY31 18.23333/58.58812 Rotatoria Synchaeta 0.4705890
Jan 2009 BY5 15.98333/55.25000 Rotatoria Synchaeta 0.3921570
Jan 2010 BY15 20.05000/57.33333 Copepoda Acartia 2.4673193
Jan 2010 BY31 18.23333/58.58812 Copepoda Acartia 2.2483670
Jan 2010 BY15 20.05000/57.33333 Copepoda Centropages 0.3071893
Jan 2010 BY31 18.23333/58.58812 Copepoda Centropages 0.3856208
Jan 2010 BY31 18.23333/58.58812 Copepoda Eurytemora 0.0849674
Jan 2010 BY15 20.05000/57.33333 Copepoda Pseudocalanus 13.6013010
Jan 2010 BY31 18.23333/58.58812 Copepoda Pseudocalanus 2.6601280
Jan 2010 BY15 20.05000/57.33333 Copepoda Temora 7.5490209
Jan 2010 BY31 18.23333/58.58812 Copepoda Temora 8.4183010
Jan 2010 BY15 20.05000/57.33333 Rotatoria Synchaeta 0.1568628
Jan 2010 BY31 18.23333/58.58812 Rotatoria Synchaeta 0.4117650
Jan 2011 BY15 20.05000/57.33333 Copepoda Acartia 5.0653670
Jan 2011 BY31 18.23333/58.58812 Copepoda Acartia 6.0653592
Jan 2011 BY5 15.98333/55.25000 Copepoda Acartia 9.6209217
Jan 2011 BY15 20.05000/57.33333 Copepoda Centropages 2.9803908
Jan 2011 BY31 18.23333/58.58812 Copepoda Centropages 0.7058820
Jan 2011 BY5 15.98333/55.25000 Copepoda Centropages 1.8692823
Jan 2011 BY31 18.23333/58.58812 Copepoda Eurytemora 1.6078432
Jan 2011 BY15 20.05000/57.33333 Copepoda Pseudocalanus 49.6601350
Jan 2011 BY31 18.23333/58.58812 Copepoda Pseudocalanus 3.3071872
Jan 2011 BY5 15.98333/55.25000 Copepoda Pseudocalanus 3.7777770
Jan 2011 BY15 20.05000/57.33333 Copepoda Temora 36.4313840
Jan 2011 BY31 18.23333/58.58812 Copepoda Temora 17.2418325
Jan 2011 BY5 15.98333/55.25000 Copepoda Temora 47.3595100
Jan 2011 BY31 18.23333/58.58812 Rotatoria Keratella 0.0196078
Jan 2011 BY5 15.98333/55.25000 Rotatoria Keratella 0.0196078
Jan 2011 BY15 20.05000/57.33333 Rotatoria Synchaeta 0.5490210
Jan 2011 BY31 18.23333/58.58812 Rotatoria Synchaeta 0.1960785
Jan 2011 BY5 15.98333/55.25000 Rotatoria Synchaeta 0.6666660
Jan 2015 BY15 20.05000/57.33333 Copepoda Acartia 4.7200063
Jan 2015 BY15 20.05000/57.33333 Copepoda Centropages 3.5399988
Jan 2015 BY15 20.05000/57.33333 Copepoda Pseudocalanus 15.0200000
Jan 2015 BY15 20.05000/57.33333 Copepoda Temora 10.7066649
Jan 2015 BY15 20.05000/57.33333 Rotatoria Keratella 0.0100000
Jan 2015 BY15 20.05000/57.33333 Rotatoria Synchaeta 0.4400010
library(PlanktonData)
data(zooplankton)
  • Is this dataset a tidy table?
str(zooplankton)

✓ Each variable is in its own column ✓ Each observation is in its own row

… But Coordinates contains 2 values (Latitude/Longitude)

  • Separate Coordinates in 2 columns: Longitude and Latitude
Month_abb Year Station Longitude Latitude Group Taxa Biomass
Jan 2009 BY15 20.05000 57.33333 Copepoda Acartia 6.6503187
Jan 2009 BY31 18.23333 58.58812 Copepoda Acartia 1.8169941
Jan 2009 BY5 15.98333 55.25000 Copepoda Acartia 5.5620974
Jan 2009 BY15 20.05000 57.33333 Copepoda Centropages 5.7385615
Jan 2009 BY31 18.23333 58.58812 Copepoda Centropages 1.2287586
Jan 2009 BY5 15.98333 55.25000 Copepoda Centropages 14.4052240
Jan 2009 BY15 20.05000 57.33333 Copepoda Pseudocalanus 10.5228820
Jan 2009 BY31 18.23333 58.58812 Copepoda Pseudocalanus 5.6339840
Jan 2009 BY5 15.98333 55.25000 Copepoda Pseudocalanus 21.5947750
Jan 2009 BY15 20.05000 57.33333 Copepoda Temora 9.7254882
Jan 2009 BY31 18.23333 58.58812 Copepoda Temora 4.9934649
Jan 2009 BY5 15.98333 55.25000 Copepoda Temora 45.7385290
Jan 2009 BY15 20.05000 57.33333 Rotatoria Synchaeta 0.3921570
Jan 2009 BY31 18.23333 58.58812 Rotatoria Synchaeta 0.4705890
Jan 2009 BY5 15.98333 55.25000 Rotatoria Synchaeta 0.3921570
Jan 2010 BY15 20.05000 57.33333 Copepoda Acartia 2.4673193
Jan 2010 BY31 18.23333 58.58812 Copepoda Acartia 2.2483670
Jan 2010 BY15 20.05000 57.33333 Copepoda Centropages 0.3071893
Jan 2010 BY31 18.23333 58.58812 Copepoda Centropages 0.3856208
Jan 2010 BY31 18.23333 58.58812 Copepoda Eurytemora 0.0849674
Jan 2010 BY15 20.05000 57.33333 Copepoda Pseudocalanus 13.6013010
Jan 2010 BY31 18.23333 58.58812 Copepoda Pseudocalanus 2.6601280
Jan 2010 BY15 20.05000 57.33333 Copepoda Temora 7.5490209
Jan 2010 BY31 18.23333 58.58812 Copepoda Temora 8.4183010
Jan 2010 BY15 20.05000 57.33333 Rotatoria Synchaeta 0.1568628
Jan 2010 BY31 18.23333 58.58812 Rotatoria Synchaeta 0.4117650
Jan 2011 BY15 20.05000 57.33333 Copepoda Acartia 5.0653670
Jan 2011 BY31 18.23333 58.58812 Copepoda Acartia 6.0653592
Jan 2011 BY5 15.98333 55.25000 Copepoda Acartia 9.6209217
Jan 2011 BY15 20.05000 57.33333 Copepoda Centropages 2.9803908
Jan 2011 BY31 18.23333 58.58812 Copepoda Centropages 0.7058820
Jan 2011 BY5 15.98333 55.25000 Copepoda Centropages 1.8692823
Jan 2011 BY31 18.23333 58.58812 Copepoda Eurytemora 1.6078432
Jan 2011 BY15 20.05000 57.33333 Copepoda Pseudocalanus 49.6601350
Jan 2011 BY31 18.23333 58.58812 Copepoda Pseudocalanus 3.3071872
Jan 2011 BY5 15.98333 55.25000 Copepoda Pseudocalanus 3.7777770
Jan 2011 BY15 20.05000 57.33333 Copepoda Temora 36.4313840
Jan 2011 BY31 18.23333 58.58812 Copepoda Temora 17.2418325
Jan 2011 BY5 15.98333 55.25000 Copepoda Temora 47.3595100
Jan 2011 BY31 18.23333 58.58812 Rotatoria Keratella 0.0196078
Jan 2011 BY5 15.98333 55.25000 Rotatoria Keratella 0.0196078
Jan 2011 BY15 20.05000 57.33333 Rotatoria Synchaeta 0.5490210
Jan 2011 BY31 18.23333 58.58812 Rotatoria Synchaeta 0.1960785
Jan 2011 BY5 15.98333 55.25000 Rotatoria Synchaeta 0.6666660
Jan 2015 BY15 20.05000 57.33333 Copepoda Acartia 4.7200063
Jan 2015 BY15 20.05000 57.33333 Copepoda Centropages 3.5399988
Jan 2015 BY15 20.05000 57.33333 Copepoda Pseudocalanus 15.0200000
Jan 2015 BY15 20.05000 57.33333 Copepoda Temora 10.7066649
Jan 2015 BY15 20.05000 57.33333 Rotatoria Keratella 0.0100000
Jan 2015 BY15 20.05000 57.33333 Rotatoria Synchaeta 0.4400010
zooplankton |> 
  tidyr::separate(col = Coordinates, into = c("Longitude", "Latitude"), sep = "/")
  • Unite Group and Taxa to create Group_Taxa
Month_abb Year Station Longitude Latitude Group_Taxa Biomass
Jan 2009 BY15 20.05000 57.33333 Copepoda_Acartia 6.6503187
Jan 2009 BY31 18.23333 58.58812 Copepoda_Acartia 1.8169941
Jan 2009 BY5 15.98333 55.25000 Copepoda_Acartia 5.5620974
Jan 2009 BY15 20.05000 57.33333 Copepoda_Centropages 5.7385615
Jan 2009 BY31 18.23333 58.58812 Copepoda_Centropages 1.2287586
Jan 2009 BY5 15.98333 55.25000 Copepoda_Centropages 14.4052240
Jan 2009 BY15 20.05000 57.33333 Copepoda_Pseudocalanus 10.5228820
Jan 2009 BY31 18.23333 58.58812 Copepoda_Pseudocalanus 5.6339840
Jan 2009 BY5 15.98333 55.25000 Copepoda_Pseudocalanus 21.5947750
Jan 2009 BY15 20.05000 57.33333 Copepoda_Temora 9.7254882
Jan 2009 BY31 18.23333 58.58812 Copepoda_Temora 4.9934649
Jan 2009 BY5 15.98333 55.25000 Copepoda_Temora 45.7385290
Jan 2009 BY15 20.05000 57.33333 Rotatoria_Synchaeta 0.3921570
Jan 2009 BY31 18.23333 58.58812 Rotatoria_Synchaeta 0.4705890
Jan 2009 BY5 15.98333 55.25000 Rotatoria_Synchaeta 0.3921570
Jan 2010 BY15 20.05000 57.33333 Copepoda_Acartia 2.4673193
Jan 2010 BY31 18.23333 58.58812 Copepoda_Acartia 2.2483670
Jan 2010 BY15 20.05000 57.33333 Copepoda_Centropages 0.3071893
Jan 2010 BY31 18.23333 58.58812 Copepoda_Centropages 0.3856208
Jan 2010 BY31 18.23333 58.58812 Copepoda_Eurytemora 0.0849674
Jan 2010 BY15 20.05000 57.33333 Copepoda_Pseudocalanus 13.6013010
Jan 2010 BY31 18.23333 58.58812 Copepoda_Pseudocalanus 2.6601280
Jan 2010 BY15 20.05000 57.33333 Copepoda_Temora 7.5490209
Jan 2010 BY31 18.23333 58.58812 Copepoda_Temora 8.4183010
Jan 2010 BY15 20.05000 57.33333 Rotatoria_Synchaeta 0.1568628
Jan 2010 BY31 18.23333 58.58812 Rotatoria_Synchaeta 0.4117650
Jan 2011 BY15 20.05000 57.33333 Copepoda_Acartia 5.0653670
Jan 2011 BY31 18.23333 58.58812 Copepoda_Acartia 6.0653592
Jan 2011 BY5 15.98333 55.25000 Copepoda_Acartia 9.6209217
Jan 2011 BY15 20.05000 57.33333 Copepoda_Centropages 2.9803908
Jan 2011 BY31 18.23333 58.58812 Copepoda_Centropages 0.7058820
Jan 2011 BY5 15.98333 55.25000 Copepoda_Centropages 1.8692823
Jan 2011 BY31 18.23333 58.58812 Copepoda_Eurytemora 1.6078432
Jan 2011 BY15 20.05000 57.33333 Copepoda_Pseudocalanus 49.6601350
Jan 2011 BY31 18.23333 58.58812 Copepoda_Pseudocalanus 3.3071872
Jan 2011 BY5 15.98333 55.25000 Copepoda_Pseudocalanus 3.7777770
Jan 2011 BY15 20.05000 57.33333 Copepoda_Temora 36.4313840
Jan 2011 BY31 18.23333 58.58812 Copepoda_Temora 17.2418325
Jan 2011 BY5 15.98333 55.25000 Copepoda_Temora 47.3595100
Jan 2011 BY31 18.23333 58.58812 Rotatoria_Keratella 0.0196078
Jan 2011 BY5 15.98333 55.25000 Rotatoria_Keratella 0.0196078
Jan 2011 BY15 20.05000 57.33333 Rotatoria_Synchaeta 0.5490210
Jan 2011 BY31 18.23333 58.58812 Rotatoria_Synchaeta 0.1960785
Jan 2011 BY5 15.98333 55.25000 Rotatoria_Synchaeta 0.6666660
Jan 2015 BY15 20.05000 57.33333 Copepoda_Acartia 4.7200063
Jan 2015 BY15 20.05000 57.33333 Copepoda_Centropages 3.5399988
Jan 2015 BY15 20.05000 57.33333 Copepoda_Pseudocalanus 15.0200000
Jan 2015 BY15 20.05000 57.33333 Copepoda_Temora 10.7066649
Jan 2015 BY15 20.05000 57.33333 Rotatoria_Keratella 0.0100000
Jan 2015 BY15 20.05000 57.33333 Rotatoria_Synchaeta 0.4400010
zooplankton |> 
  tidyr::separate(col = Coordinates, into = c("Longitude", "Latitude"), sep = "/") |> 
  unite(col = "Group_Taxa", c(Group, Taxa))
  • Make a wide table with columns showing the Biomass values for each Group_Taxa
Month_abb Year Station Longitude Latitude Copepoda_Acartia Copepoda_Centropages Copepoda_Pseudocalanus Copepoda_Temora Rotatoria_Synchaeta Copepoda_Eurytemora Rotatoria_Keratella Cladocera_Bosmina Cladocera_Evadne Cladocera_Podon
Jan 2009 BY15 20.05000 57.33333 6.6503187 5.7385615 10.5228820 9.7254882 0.3921570 NA NA NA NA NA
Jan 2009 BY31 18.23333 58.58812 1.8169941 1.2287586 5.6339840 4.9934649 0.4705890 NA NA NA NA NA
Jan 2009 BY5 15.98333 55.25000 5.5620974 14.4052240 21.5947750 45.7385290 0.3921570 NA NA NA NA NA
Jan 2010 BY15 20.05000 57.33333 2.4673193 0.3071893 13.6013010 7.5490209 0.1568628 NA NA NA NA NA
Jan 2010 BY31 18.23333 58.58812 2.2483670 0.3856208 2.6601280 8.4183010 0.4117650 0.0849674 NA NA NA NA
Jan 2011 BY15 20.05000 57.33333 5.0653670 2.9803908 49.6601350 36.4313840 0.5490210 NA NA NA NA NA
Jan 2011 BY31 18.23333 58.58812 6.0653592 0.7058820 3.3071872 17.2418325 0.1960785 1.6078432 0.0196078 NA NA NA
Jan 2011 BY5 15.98333 55.25000 9.6209217 1.8692823 3.7777770 47.3595100 0.6666660 NA 0.0196078 NA NA NA
Jan 2015 BY15 20.05000 57.33333 4.7200063 3.5399988 15.0200000 10.7066649 0.4400010 NA 0.0100000 NA NA NA
Jan 2016 BY15 20.05000 57.33333 2.4866676 6.2266595 NA 17.6533153 0.9200010 NA NA NA NA NA
Jan 2016 BY31 18.23333 58.58812 1.4616663 1.5533334 2.8066655 0.9066675 0.2000001 NA NA NA NA NA
Jan 2016 BY5 15.98333 55.25000 8.6866656 6.1800078 1.8949995 17.8799791 0.8199990 NA 0.0500000 NA NA NA
Jan 2017 BY15 20.05000 57.33333 3.7200071 2.7533319 5.5399920 14.0999830 0.2799999 NA NA NA NA NA
Jan 2017 BY31 18.23333 58.58812 3.0466673 0.5666663 2.5600009 0.8799999 0.4400010 0.2600000 NA NA NA NA
Jan 2017 BY5 15.98333 55.25000 NA NA 2.1333311 NA NA NA NA NA NA NA
Jan 2018 BY5 15.98333 55.25000 16.6199952 7.3466575 6.5233210 19.6533255 0.0800001 NA NA NA NA NA
Jan 2019 BY5 15.98333 55.25000 23.1800075 10.7066665 4.9000011 57.7600260 0.7200000 NA 0.6199995 0.0333332 2.000001 NA
Jan 2019 BY31 18.23333 58.58812 6.8733320 1.3066646 1.9599995 12.6533285 0.2400000 0.5066654 NA NA NA NA
Jan 2020 BY15 20.05000 57.33333 8.3133342 6.7866703 7.3066620 14.4400057 0.0399999 NA 0.0100000 NA NA NA
Jan 2020 BY5 15.98333 55.25000 24.2666615 13.5400054 2.6533343 23.5466793 0.1400001 NA NA NA NA NA
Jan 2021 BY31 18.23333 58.58812 2.3799989 0.7699998 0.6720001 0.8866667 0.0399999 0.0333333 0.0100000 NA NA NA
Feb 2009 BY15 20.05000 57.33333 4.5620851 6.5359550 2.0718956 7.7647077 0.3137250 NA NA NA NA NA
Feb 2009 BY31 18.23333 58.58812 2.1045759 2.3398701 4.5653635 4.8496740 0.3529410 NA NA NA NA NA
Feb 2009 BY5 15.98333 55.25000 2.9084958 19.3333680 5.5457519 40.3137360 0.3137250 NA NA NA NA NA
Feb 2010 BY15 20.05000 57.33333 7.8562004 2.0915040 1.8718946 9.8235324 0.0392157 NA NA NA NA NA
Feb 2010 BY5 15.98333 55.25000 8.4052182 6.9934715 15.5490030 13.4248350 0.1960785 NA NA NA NA NA
Feb 2013 BY31 18.23333 58.58812 1.1552277 1.1013079 2.9616028 11.4738560 0.0882352 0.0849674 0.0098039 NA NA NA
Feb 2013 BY5 15.98333 55.25000 NA NA 4.8575163 NA NA NA NA NA NA NA
Feb 2014 BY31 18.23333 58.58812 0.4509799 0.9248368 4.8163384 7.6928012 0.0882352 NA 0.0098039 NA NA NA
Feb 2014 BY5 15.98333 55.25000 7.6078378 3.6209163 7.9607840 37.1111384 0.1568628 NA NA NA NA NA
Feb 2015 BY5 15.98333 55.25000 5.1966737 4.0933392 7.2983329 7.0599986 0.1800000 NA NA 0.0166667 NA NA
Feb 2015 BY15 20.05000 57.33333 1.5866677 0.3533334 1.9693346 0.7000003 0.8600010 NA NA NA NA NA
Feb 2015 BY31 18.23333 58.58812 1.4066646 0.8000000 NA 0.3599991 0.3600000 NA NA NA NA NA
Feb 2016 BY15 20.05000 57.33333 1.6966665 3.3833390 NA 0.8833339 0.2199999 NA 0.0100000 NA NA NA
Feb 2016 BY31 18.23333 58.58812 0.9933338 2.7733390 0.7226666 3.2799991 0.1599999 0.1733329 NA NA NA NA
Feb 2016 BY5 15.98333 55.25000 9.0199986 38.9333170 1.5266680 18.7200200 1.3599990 NA 0.0199999 NA NA NA
Feb 2017 BY15 20.05000 57.33333 3.8166672 2.2700006 11.6366740 5.9499977 0.0500001 NA NA NA NA NA
Feb 2017 BY31 18.23333 58.58812 4.9733265 1.5399979 2.3733321 2.3066642 0.2400000 NA 0.0100000 NA NA NA
Feb 2019 BY31 18.23333 58.58812 4.9299932 0.8466660 1.7399985 3.7933321 1.0899990 0.2866668 NA NA 0.399999 NA
Feb 2019 BY15 20.05000 57.33333 29.6599990 24.3200150 14.4266760 34.0533090 0.1599999 NA 0.0199999 NA NA NA
Feb 2019 BY5 15.98333 55.25000 7.9866830 6.6533407 4.3573344 25.1732890 0.3920010 NA NA NA NA NA
Feb 2020 BY5 15.98333 55.25000 17.5700062 11.0200060 2.9599981 20.4599863 0.0800001 NA NA NA NA NA
Feb 2021 BY31 18.23333 58.58812 3.1666738 1.8000002 1.2466660 0.9133316 NA NA 0.0100000 NA NA NA
Feb 2021 BY5 15.98333 55.25000 3.2200064 4.9919910 6.2186631 1.6240000 NA NA NA NA NA NA
Mar 2007 BY31 18.23333 58.58812 2.7687899 1.2843136 24.2298540 0.3251634 0.6813720 0.0326798 0.1250002 NA NA NA
Mar 2008 BY31 18.23333 58.58812 5.0522894 1.2222226 1.6307192 0.6274512 0.9215700 NA 0.0588236 NA NA NA
Mar 2009 BY15 20.05000 57.33333 1.9052292 7.9084956 4.3856250 4.2810415 0.0980391 NA NA NA NA NA
Mar 2009 BY31 18.23333 58.58812 1.9542489 0.9869278 1.3748356 2.9019625 0.7156856 NA 0.0049020 NA NA NA
Mar 2010 BY31 18.23333 58.58812 0.6200976 0.1535947 1.1405225 0.6895424 1.8676485 0.4232027 0.0098039 0.0980392 NA NA
Mar 2010 BY15 20.05000 57.33333 7.5849613 5.4836513 6.2875840 7.7516296 0.4509810 NA NA NA NA NA
zooplankton |> 
  tidyr::separate(col = Coordinates, into = c("Longitude", "Latitude"), sep = "/") |> 
  tidyr::unite(col = "Group_Taxa", c(Group, Taxa)) |> 
  tidyr::pivot_wider(names_from = Group_Taxa, values_from = Biomass) 

Manipulate the data with dplyr

Once the data are tidy, we can start using the dplyr package to process our data. The main advantage of this package is that the functions are self-explanatory by their names and simple to use.

Exercises

  • Load the dataset phytoplankton in your environment
Month_abb Year Station Coordinates Taxa Biomass
Jan 2007 BY15 20.05000/57.33333 Cyanobacteria 1.4170670
Jan 2007 BY15 20.05000/57.33333 Diatoms 1.7625112
Jan 2007 BY31 18.23333/58.58812 Diatoms 0.1557741
Jan 2007 BY5 15.98333/55.25000 Diatoms 1.6393078
Jan 2007 BY15 20.05000/57.33333 Dinoflagellates 0.6395350
Jan 2007 BY31 18.23333/58.58812 Dinoflagellates 0.0588896
Jan 2007 BY5 15.98333/55.25000 Dinoflagellates 0.0915522
Jan 2007 BY15 20.05000/57.33333 Mesodinium 0.4351159
Jan 2007 BY31 18.23333/58.58812 Mesodinium 0.0747435
Jan 2007 BY5 15.98333/55.25000 Mesodinium 1.4315310
Jan 2007 BY15 20.05000/57.33333 Other 1.3033312
Jan 2007 BY31 18.23333/58.58812 Other 0.2156630
Jan 2007 BY5 15.98333/55.25000 Other 1.1125434
Feb 2007 BY15 20.05000/57.33333 Cyanobacteria 0.0383238
Feb 2007 BY31 18.23333/58.58812 Cyanobacteria 0.0045549
Feb 2007 BY15 20.05000/57.33333 Diatoms 0.2685554
Feb 2007 BY31 18.23333/58.58812 Diatoms 0.4660469
Feb 2007 BY15 20.05000/57.33333 Dinoflagellates 0.5857735
Feb 2007 BY31 18.23333/58.58812 Dinoflagellates 2.7702498
Feb 2007 BY15 20.05000/57.33333 Mesodinium 2.3027740
Feb 2007 BY31 18.23333/58.58812 Mesodinium 4.6692790
Feb 2007 BY15 20.05000/57.33333 Other 2.5899855
Feb 2007 BY31 18.23333/58.58812 Other 1.0055273
Mar 2007 BY15 20.05000/57.33333 Cyanobacteria 0.0332965
Mar 2007 BY31 18.23333/58.58812 Cyanobacteria 0.7736520
Mar 2007 BY15 20.05000/57.33333 Diatoms 1.8827332
Mar 2007 BY31 18.23333/58.58812 Diatoms 19.7081666
Mar 2007 BY5 15.98333/55.25000 Diatoms 0.4082621
Mar 2007 BY15 20.05000/57.33333 Dinoflagellates 0.9278628
Mar 2007 BY31 18.23333/58.58812 Dinoflagellates 3.6800348
Mar 2007 BY5 15.98333/55.25000 Dinoflagellates 3.3149470
Mar 2007 BY15 20.05000/57.33333 Mesodinium 5.6033645
Mar 2007 BY31 18.23333/58.58812 Mesodinium 5.0108820
Mar 2007 BY5 15.98333/55.25000 Mesodinium 7.1609390
Mar 2007 BY15 20.05000/57.33333 Other 3.0645188
Mar 2007 BY31 18.23333/58.58812 Other 1.6856426
Mar 2007 BY5 15.98333/55.25000 Other 2.9057876
Apr 2007 BY15 20.05000/57.33333 Cyanobacteria 0.9965123
Apr 2007 BY31 18.23333/58.58812 Cyanobacteria 1.5807710
Apr 2007 BY5 15.98333/55.25000 Cyanobacteria 0.0410806
Apr 2007 BY15 20.05000/57.33333 Diatoms 0.0045963
Apr 2007 BY31 18.23333/58.58812 Diatoms 24.1518964
Apr 2007 BY15 20.05000/57.33333 Dinoflagellates 156.7101643
Apr 2007 BY31 18.23333/58.58812 Dinoflagellates 71.8011718
Apr 2007 BY5 15.98333/55.25000 Dinoflagellates 67.6874160
Apr 2007 BY15 20.05000/57.33333 Mesodinium 15.7203510
Apr 2007 BY31 18.23333/58.58812 Mesodinium 12.4941918
Apr 2007 BY5 15.98333/55.25000 Mesodinium 36.8173030
Apr 2007 BY15 20.05000/57.33333 Other 3.8423489
Apr 2007 BY31 18.23333/58.58812 Other 8.1863325
data(phytoplankton)
  • As with the zooplankton dataset, separate Coordinates as Longitude and Latitude
Month_abb Year Station Longitude Latitude Taxa Biomass
Jan 2007 BY15 20.05000 57.33333 Cyanobacteria 1.4170670
Jan 2007 BY15 20.05000 57.33333 Diatoms 1.7625112
Jan 2007 BY31 18.23333 58.58812 Diatoms 0.1557741
Jan 2007 BY5 15.98333 55.25000 Diatoms 1.6393078
Jan 2007 BY15 20.05000 57.33333 Dinoflagellates 0.6395350
Jan 2007 BY31 18.23333 58.58812 Dinoflagellates 0.0588896
Jan 2007 BY5 15.98333 55.25000 Dinoflagellates 0.0915522
Jan 2007 BY15 20.05000 57.33333 Mesodinium 0.4351159
Jan 2007 BY31 18.23333 58.58812 Mesodinium 0.0747435
Jan 2007 BY5 15.98333 55.25000 Mesodinium 1.4315310
Jan 2007 BY15 20.05000 57.33333 Other 1.3033312
Jan 2007 BY31 18.23333 58.58812 Other 0.2156630
Jan 2007 BY5 15.98333 55.25000 Other 1.1125434
Feb 2007 BY15 20.05000 57.33333 Cyanobacteria 0.0383238
Feb 2007 BY31 18.23333 58.58812 Cyanobacteria 0.0045549
Feb 2007 BY15 20.05000 57.33333 Diatoms 0.2685554
Feb 2007 BY31 18.23333 58.58812 Diatoms 0.4660469
Feb 2007 BY15 20.05000 57.33333 Dinoflagellates 0.5857735
Feb 2007 BY31 18.23333 58.58812 Dinoflagellates 2.7702498
Feb 2007 BY15 20.05000 57.33333 Mesodinium 2.3027740
Feb 2007 BY31 18.23333 58.58812 Mesodinium 4.6692790
Feb 2007 BY15 20.05000 57.33333 Other 2.5899855
Feb 2007 BY31 18.23333 58.58812 Other 1.0055273
Mar 2007 BY15 20.05000 57.33333 Cyanobacteria 0.0332965
Mar 2007 BY31 18.23333 58.58812 Cyanobacteria 0.7736520
Mar 2007 BY15 20.05000 57.33333 Diatoms 1.8827332
Mar 2007 BY31 18.23333 58.58812 Diatoms 19.7081666
Mar 2007 BY5 15.98333 55.25000 Diatoms 0.4082621
Mar 2007 BY15 20.05000 57.33333 Dinoflagellates 0.9278628
Mar 2007 BY31 18.23333 58.58812 Dinoflagellates 3.6800348
Mar 2007 BY5 15.98333 55.25000 Dinoflagellates 3.3149470
Mar 2007 BY15 20.05000 57.33333 Mesodinium 5.6033645
Mar 2007 BY31 18.23333 58.58812 Mesodinium 5.0108820
Mar 2007 BY5 15.98333 55.25000 Mesodinium 7.1609390
Mar 2007 BY15 20.05000 57.33333 Other 3.0645188
Mar 2007 BY31 18.23333 58.58812 Other 1.6856426
Mar 2007 BY5 15.98333 55.25000 Other 2.9057876
Apr 2007 BY15 20.05000 57.33333 Cyanobacteria 0.9965123
Apr 2007 BY31 18.23333 58.58812 Cyanobacteria 1.5807710
Apr 2007 BY5 15.98333 55.25000 Cyanobacteria 0.0410806
Apr 2007 BY15 20.05000 57.33333 Diatoms 0.0045963
Apr 2007 BY31 18.23333 58.58812 Diatoms 24.1518964
Apr 2007 BY15 20.05000 57.33333 Dinoflagellates 156.7101643
Apr 2007 BY31 18.23333 58.58812 Dinoflagellates 71.8011718
Apr 2007 BY5 15.98333 55.25000 Dinoflagellates 67.6874160
Apr 2007 BY15 20.05000 57.33333 Mesodinium 15.7203510
Apr 2007 BY31 18.23333 58.58812 Mesodinium 12.4941918
Apr 2007 BY5 15.98333 55.25000 Mesodinium 36.8173030
Apr 2007 BY15 20.05000 57.33333 Other 3.8423489
Apr 2007 BY31 18.23333 58.58812 Other 8.1863325
phytoplankton |> 
  tidyr::separate(col = Coordinates, into = c("Longitude", "Latitude"), sep = "/") 
  • What is the class of the Longitude and Latitude columns?

They are characters.

phytoplankton |> 
  tidyr::separate(col = Coordinates, into = c("Longitude", "Latitude"), sep = "/") |> 
  str()
#> tibble [2,284 × 7] (S3: tbl_df/tbl/data.frame)
#>  $ Month_abb: chr [1:2284] "Jan" "Jan" "Jan" "Jan" ...
#>  $ Year     : chr [1:2284] "2007" "2007" "2007" "2007" ...
#>  $ Station  : chr [1:2284] "BY15" "BY15" "BY31" "BY5" ...
#>  $ Longitude: chr [1:2284] "20.05000" "20.05000" "18.23333" "15.98333" ...
#>  $ Latitude : chr [1:2284] "57.33333" "57.33333" "58.58812" "55.25000" ...
#>  $ Taxa     : chr [1:2284] "Cyanobacteria" "Diatoms" "Diatoms" "Diatoms" ...
#>  $ Biomass  : num [1:2284] 1.417 1.763 0.156 1.639 0.64 ...
  • If they are not numeric, modify them as numeric
?as.numeric
?mutate
phytoplankton |> 
  tidyr::separate(col = Coordinates, into = c("Longitude", "Latitude"), sep = "/") |> 
  dplyr::mutate(Longitude = as.numeric(Longitude),
                Latitude = as.numeric(Latitude))
  • Keep only the data from year between 2008 and 2010
example("%in%")
?filter
phytoplankton |> 
  tidyr::separate(col = Coordinates, into = c("Longitude", "Latitude"), sep = "/") |> 
  dplyr::mutate(Longitude = as.numeric(Longitude),
                Latitude = as.numeric(Latitude)) |> 
  dplyr::filter(Year %in% 2008:2010)
  • Keep only the data with biomass higher or equal to 0.5 \ \mu g L^{-1}
phytoplankton |> 
  tidyr::separate(col = Coordinates, into = c("Longitude", "Latitude"), sep = "/") |> 
  dplyr::mutate(Longitude = as.numeric(Longitude),
                Latitude = as.numeric(Latitude)) |> 
  dplyr::filter(Year %in% 2008:2010,
                Biomass >= 0.5)
  • Rename Month_abb as Month
Month Year Station Longitude Latitude Taxa Biomass
Jan 2008 BY15 20.05000 57.33333 Cyanobacteria 0.7330137
Jan 2008 BY15 20.05000 57.33333 Diatoms 1.6245314
Jan 2008 BY5 15.98333 55.25000 Diatoms 3.0818455
Jan 2008 BY15 20.05000 57.33333 Dinoflagellates 0.6737093
Jan 2008 BY5 15.98333 55.25000 Dinoflagellates 0.8869308
Jan 2008 BY15 20.05000 57.33333 Mesodinium 0.5885970
Jan 2008 BY31 18.23333 58.58812 Mesodinium 0.6849050
Jan 2008 BY5 15.98333 55.25000 Mesodinium 2.0836469
Jan 2008 BY15 20.05000 57.33333 Other 7.3249900
Jan 2008 BY31 18.23333 58.58812 Other 3.1265832
Jan 2008 BY5 15.98333 55.25000 Other 3.3561542
Feb 2008 BY15 20.05000 57.33333 Diatoms 1.2680965
Feb 2008 BY5 15.98333 55.25000 Diatoms 2.8023814
Feb 2008 BY15 20.05000 57.33333 Dinoflagellates 1.1329852
Feb 2008 BY5 15.98333 55.25000 Mesodinium 20.0837900
Feb 2008 BY15 20.05000 57.33333 Other 16.9987771
Feb 2008 BY31 18.23333 58.58812 Other 8.6802596
Feb 2008 BY5 15.98333 55.25000 Other 16.0946263
Mar 2008 BY15 20.05000 57.33333 Diatoms 1.6448488
Mar 2008 BY31 18.23333 58.58812 Diatoms 30.4619174
Mar 2008 BY5 15.98333 55.25000 Diatoms 4.6464811
Mar 2008 BY15 20.05000 57.33333 Dinoflagellates 13.6085017
Mar 2008 BY31 18.23333 58.58812 Dinoflagellates 5.7408130
Mar 2008 BY5 15.98333 55.25000 Dinoflagellates 1.1267941
Mar 2008 BY15 20.05000 57.33333 Mesodinium 7.3048790
Mar 2008 BY31 18.23333 58.58812 Mesodinium 7.6901795
Mar 2008 BY5 15.98333 55.25000 Mesodinium 14.4894850
Mar 2008 BY15 20.05000 57.33333 Other 200.9616193
Mar 2008 BY31 18.23333 58.58812 Other 9.3419906
Mar 2008 BY5 15.98333 55.25000 Other 4.0108086
Apr 2008 BY15 20.05000 57.33333 Cyanobacteria 3.2442822
Apr 2008 BY31 18.23333 58.58812 Cyanobacteria 0.5975883
Apr 2008 BY15 20.05000 57.33333 Diatoms 11.5565268
Apr 2008 BY31 18.23333 58.58812 Diatoms 16.4387791
Apr 2008 BY5 15.98333 55.25000 Diatoms 1.3419904
Apr 2008 BY15 20.05000 57.33333 Dinoflagellates 67.6764830
Apr 2008 BY31 18.23333 58.58812 Dinoflagellates 24.7080846
Apr 2008 BY5 15.98333 55.25000 Dinoflagellates 9.2278796
Apr 2008 BY15 20.05000 57.33333 Mesodinium 5.3117510
Apr 2008 BY31 18.23333 58.58812 Mesodinium 12.5595830
Apr 2008 BY5 15.98333 55.25000 Mesodinium 11.9859809
Apr 2008 BY15 20.05000 57.33333 Other 97.1777810
Apr 2008 BY31 18.23333 58.58812 Other 52.1543315
Apr 2008 BY5 15.98333 55.25000 Other 73.0003628
May 2008 BY15 20.05000 57.33333 Cyanobacteria 1.9917544
May 2008 BY31 18.23333 58.58812 Cyanobacteria 1.8096993
May 2008 BY31 18.23333 58.58812 Diatoms 3.7689240
May 2008 BY15 20.05000 57.33333 Dinoflagellates 49.6794582
May 2008 BY31 18.23333 58.58812 Dinoflagellates 15.4334030
May 2008 BY5 15.98333 55.25000 Dinoflagellates 7.7954351
phytoplankton |> 
  tidyr::separate(col = Coordinates, into = c("Longitude", "Latitude"), sep = "/") |> 
  dplyr::mutate(Longitude = as.numeric(Longitude),
                Latitude = as.numeric(Latitude)) |> 
  dplyr::filter(Year %in% 2008:2010,
                Biomass >= 0.5) |> 
  dplyr::rename(Month = Month_abb)
  • Keep all the column except Longitude
Month Year Station Latitude Taxa Biomass
Jan 2008 BY15 57.33333 Cyanobacteria 0.7330137
Jan 2008 BY15 57.33333 Diatoms 1.6245314
Jan 2008 BY5 55.25000 Diatoms 3.0818455
Jan 2008 BY15 57.33333 Dinoflagellates 0.6737093
Jan 2008 BY5 55.25000 Dinoflagellates 0.8869308
Jan 2008 BY15 57.33333 Mesodinium 0.5885970
Jan 2008 BY31 58.58812 Mesodinium 0.6849050
Jan 2008 BY5 55.25000 Mesodinium 2.0836469
Jan 2008 BY15 57.33333 Other 7.3249900
Jan 2008 BY31 58.58812 Other 3.1265832
Jan 2008 BY5 55.25000 Other 3.3561542
Feb 2008 BY15 57.33333 Diatoms 1.2680965
Feb 2008 BY5 55.25000 Diatoms 2.8023814
Feb 2008 BY15 57.33333 Dinoflagellates 1.1329852
Feb 2008 BY5 55.25000 Mesodinium 20.0837900
Feb 2008 BY15 57.33333 Other 16.9987771
Feb 2008 BY31 58.58812 Other 8.6802596
Feb 2008 BY5 55.25000 Other 16.0946263
Mar 2008 BY15 57.33333 Diatoms 1.6448488
Mar 2008 BY31 58.58812 Diatoms 30.4619174
Mar 2008 BY5 55.25000 Diatoms 4.6464811
Mar 2008 BY15 57.33333 Dinoflagellates 13.6085017
Mar 2008 BY31 58.58812 Dinoflagellates 5.7408130
Mar 2008 BY5 55.25000 Dinoflagellates 1.1267941
Mar 2008 BY15 57.33333 Mesodinium 7.3048790
Mar 2008 BY31 58.58812 Mesodinium 7.6901795
Mar 2008 BY5 55.25000 Mesodinium 14.4894850
Mar 2008 BY15 57.33333 Other 200.9616193
Mar 2008 BY31 58.58812 Other 9.3419906
Mar 2008 BY5 55.25000 Other 4.0108086
Apr 2008 BY15 57.33333 Cyanobacteria 3.2442822
Apr 2008 BY31 58.58812 Cyanobacteria 0.5975883
Apr 2008 BY15 57.33333 Diatoms 11.5565268
Apr 2008 BY31 58.58812 Diatoms 16.4387791
Apr 2008 BY5 55.25000 Diatoms 1.3419904
Apr 2008 BY15 57.33333 Dinoflagellates 67.6764830
Apr 2008 BY31 58.58812 Dinoflagellates 24.7080846
Apr 2008 BY5 55.25000 Dinoflagellates 9.2278796
Apr 2008 BY15 57.33333 Mesodinium 5.3117510
Apr 2008 BY31 58.58812 Mesodinium 12.5595830
Apr 2008 BY5 55.25000 Mesodinium 11.9859809
Apr 2008 BY15 57.33333 Other 97.1777810
Apr 2008 BY31 58.58812 Other 52.1543315
Apr 2008 BY5 55.25000 Other 73.0003628
May 2008 BY15 57.33333 Cyanobacteria 1.9917544
May 2008 BY31 58.58812 Cyanobacteria 1.8096993
May 2008 BY31 58.58812 Diatoms 3.7689240
May 2008 BY15 57.33333 Dinoflagellates 49.6794582
May 2008 BY31 58.58812 Dinoflagellates 15.4334030
May 2008 BY5 55.25000 Dinoflagellates 7.7954351
phytoplankton |> 
  tidyr::separate(col = Coordinates, into = c("Longitude", "Latitude"), sep = "/") |> 
  dplyr::mutate(Longitude = as.numeric(Longitude),
                Latitude = as.numeric(Latitude)) |> 
  dplyr::filter(Year %in% 2008:2010,
                Biomass >= 0.5) |> 
  dplyr::rename(Month = Month_abb) |> 
  dplyr::select(-Longitude)
  • Make a summary table showing the Taxa average Monthly Biomass between Station
Station Month Taxa Average_Biomass
BY15 Apr Cyanobacteria 2.2431900
BY15 Apr Diatoms 23.1445282
BY15 Apr Dinoflagellates 83.9909404
BY15 Apr Mesodinium 29.6087615
BY15 Apr Other 36.7386392
BY15 Aug Cyanobacteria 30.9818961
BY15 Aug Diatoms 2.2636098
BY15 Aug Dinoflagellates 9.3918903
BY15 Aug Mesodinium 9.9576955
BY15 Aug Other 9.6425753
BY15 Feb Diatoms 1.3669538
BY15 Feb Dinoflagellates 0.8888890
BY15 Feb Mesodinium 3.0700606
BY15 Feb Other 6.6884657
BY15 Jan Cyanobacteria 0.7330137
BY15 Jan Diatoms 1.3262260
BY15 Jan Dinoflagellates 0.7177460
BY15 Jan Mesodinium 1.3992310
BY15 Jan Other 2.9148920
BY15 Jul Cyanobacteria 53.9321258
BY15 Jul Diatoms 1.9936481
BY15 Jul Dinoflagellates 16.1664696
BY15 Jul Mesodinium 8.0555951
BY15 Jul Other 22.4351224
BY15 Jun Cyanobacteria 26.5576427
BY15 Jun Diatoms 0.6320936
BY15 Jun Dinoflagellates 27.3757633
BY15 Jun Mesodinium 33.1540950
BY15 Jun Other 46.1942387
BY15 Mar Diatoms 1.4455245
BY15 Mar Dinoflagellates 6.2031314
BY15 Mar Mesodinium 6.5886593
BY15 Mar Other 68.6788097
BY15 May Cyanobacteria 1.4308471
BY15 May Dinoflagellates 59.9251837
BY15 May Mesodinium 74.1766803
BY15 May Other 54.9198979
BY15 Nov Cyanobacteria 1.1144862
BY15 Nov Diatoms 30.4754041
BY15 Nov Dinoflagellates 1.0517013
BY15 Nov Mesodinium 3.8012673
BY15 Nov Other 8.4426735
BY15 Oct Cyanobacteria 4.9325064
BY15 Oct Diatoms 3.4127018
BY15 Oct Dinoflagellates 3.9303931
BY15 Oct Mesodinium 5.5022190
BY15 Oct Other 5.4928720
BY15 Sep Cyanobacteria 1.9732444
BY15 Sep Diatoms 3.6823882
BY15 Sep Dinoflagellates 5.8051506
phytoplankton |> 
  tidyr::separate(col = Coordinates, into = c("Longitude", "Latitude"), sep = "/") |> 
  dplyr::mutate(Longitude = as.numeric(Longitude),
                Latitude = as.numeric(Latitude)) |> 
  dplyr::filter(Year %in% 2008:2010,
                Biomass >= 0.5) |> 
  dplyr::rename(Month = Month_abb) |> 
  dplyr::select(-Longitude) |> 
  dplyr::group_by(Station, Month, Taxa) |> 
  dplyr::summarise(Average_Biomass = mean(Biomass)) |> 
  dplyr::ungroup()
Tip

Once you have made your operation using group_by and summarise, you can pipe the function ungroup()

Data visualisation with ggplot2

In this part we will build a plot step by step using the grammar of graphic in ggplot2.

Exercises

  • Load the ggplot2 library and import the zooplankton dataset
library(ggplot2)
data(zooplankton)
  • Create a dataframe named df containing only Centropages biomass
df <- zooplankton |> 
  dplyr::filter(Taxa == "Centropages")
  • Initiate a ggplot using the data df and the Month_abb on the x-axis and the Biomass on the y-axis

ggplot(data = df,
       mapping = aes(x = Month_abb,
                     y = Biomass))

The month are ordered by alphabetical order, this can be changed using the code below

df_ordered <- df |> 
  dplyr::mutate(Month_abb = factor(Month_abb, levels = month.abb))

And then run again the initiation but this time with the dataframe df_ordered

ggplot(data = df_ordered,
       mapping = aes(x = Month_abb,
                     y = Biomass))

  • Add the biomass data as point

ggplot(data = df_ordered,
       mapping = aes(x = Month_abb,
                     y = Biomass)) +
  geom_point()
  • Separate the plot into 3 facets corresponding to the stations

ggplot(data = df_ordered,
       mapping = aes(x = Month_abb,
                     y = Biomass)) +
  geom_point()+
  facet_wrap(~Station)
  • Color the points by Year

ggplot(data = df_ordered,
       mapping = aes(x = Month_abb,
                     y = Biomass)) +
  geom_point(mapping = aes(color = Year))+
  facet_wrap(~Station)
  • Rotate by 90° the x-axis text, change the color, the size and the font face of strip text and add a background color

ggplot(data = df_ordered,
       mapping = aes(x = Month_abb,
                     y = Biomass)) +
  geom_point(mapping = aes(color = Year)) +
  facet_wrap(~Station) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90),
        strip.text = element_text(color = "firebrick",
                                  size = 12,
                                  face = "bold"),
        panel.background = element_rect(fill = "forestgreen"))
  • Change the x-axis name to Month and add a title My plot

p <- ggplot(data = df_ordered,
       mapping = aes(x = Month_abb,
                     y = Biomass)) +
  geom_point(mapping = aes(color = Year)) +
  facet_wrap(~Station) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90),
        strip.text = element_text(color = "firebrick",
                                  size = 12,
                                  face = "bold"),
        panel.background = element_rect(fill = "forestgreen")) +
  labs(x = "Month",
       title = "My plot")
p
  • Now create a dataframe df_summarythat contain the mean and the standard error of Centropages monthly biomass by Station between 2007 and 2021
#> `summarise()` has grouped output by 'Month_abb'. You can override using the
#> `.groups` argument.

Have a look at the function se from the PlanktonData package

df_summary <- df |>
  group_by(Month_abb, Station) |> 
  summarise(Average = mean(Biomass),
            SE = se(Biomass)) |> 
  ungroup()
  • Add a geom_bar showing the average monthly biomass to the plot

  • We can use another geom_* with another data using +
  • Sometime it is also important to add the parameter stat = "identity" in a geom_bar
p +
  geom_bar(data = df_summary,
           mapping = aes(x = Month_abb,
                         y = Average),
           stat = "identity")
  • Add a geom_errorbar showing the mean ± standard error

p +
  geom_bar(data = df_summary,
           mapping = aes(x = Month_abb,
                         y = Average),
           stat = "identity") +
  geom_errorbar(data = df_summary,
                mapping = aes(x = Month_abb,
                              y = Average,
                              ymin = Average - SE,
                              ymax = Average + SE))
  • Change the fill to of the bar to #EFF675 and width of the errorbar to 0

p +
  geom_bar(data = df_summary,
           mapping = aes(x = Month_abb,
                         y = Average),
           stat = "identity",
           fill = "#EFF675") +
  geom_errorbar(data = df_summary,
                mapping = aes(x = Month_abb,
                              y = Average,
                              ymin = Average - SE,
                              ymax = Average + SE),
                width = 0)
  • Make these changes from the initial plot:
    • Change the points to a jitter plot that is in front of the bars
    • Change the shape of the jitter to the shape 21
    • Change the color of the bars to black and fill them as well as the jitter according to the Station
    • Remove the panel background and change the color of the strip text to black

The order matters. The first geom_* will be the first pasted on the plot.

First we need to reorder the df_summary as we did with the df_ordered

df_summary <- df_summary |> 
  mutate(Month_abb = factor(Month_abb, levels = month.abb))

And then we can plot

ggplot(data = df_ordered,
       mapping = aes(x = Month_abb,
                     y = Biomass)) +
  
  # First start with the bars
  geom_bar(data = df_summary,
           mapping = aes(x = Month_abb,
                         y = Average,
                         fill = Station),
           stat = "identity",
           alpha = 0.2, # <------ this is the transparency argument
           col = "black") + # <------- this add the color around the bars 
  # Add the errorbar
  geom_errorbar(data = df_summary,
                mapping = aes(x = Month_abb,
                              y = Average,
                              ymin = Average - SE,
                              ymax = Average + SE),
                width = 0) +
  # Replace geom_point with geom_jitter
  geom_jitter(mapping = aes(fill = Station), # <----- that is filled according to the station
              shape = 21, # <---- and from the shape 21
              # With geom_jitter, we can chose how much jitter we want. I suggest to set only a jitter on the x-axis and not on the y-axis
              width = .1,
              height = 0) +
  facet_wrap(~Station) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90),
        strip.text = element_text(color = "black",
                                  size = 12,
                                  face = "bold"),
        panel.background = element_rect(fill = "transparent"),
        legend.position = "none") +
  labs(x = "Month",
       title = "My plot") +
  # This is optional but you can chose the fill color like this
  scale_fill_manual(values = c("#01665e","#d8b365", "#762a83"))

Optional exercises

Create this plot using the data zooplankton and phytoplankton:

  • Combine the datasets zooplankton and phytoplankton to create a dataset named plankton
?rbind()

This functions requires the same number and name of columns!

data("zooplankton") ; data("phytoplankton")

plankton <-
  phytoplankton |> 
  dplyr::mutate(Group = "Phytoplankton") |> # <---- We need to add a Group column to rbind it
  rbind(zooplankton)
  • From the dataset plankton keep only the values of for Copepoda and Phytoplankton
plankton |> 
  dplyr::filter(Group %in% c("Phytoplankton", "Copepoda"))
  • Add a Season column containing the seasons (Winter = Jan, Feb, Mar; Spring = Apr, May, Jun; Summer = July, Aug, Sep; Autumn= Oct, Nov, Dec)
  • We can also specify that this is a factor that has levels
?case_when()
?month.abb

We to select a value within a vector we can also use the []. For example:

x <- c("One", "Two", "Three", "Four")
x[1]
#> [1] "One"
x[1:3]
#> [1] "One"   "Two"   "Three"
plankton |> 
  dplyr::filter(Group %in% c("Phytoplankton", "Copepoda")) |> 
  dplyr::mutate(Season = case_when(
    Month_abb %in% month.abb[1:3] ~ "Winter",
    Month_abb %in% month.abb[4:6] ~ "Spring",
    Month_abb %in% month.abb[7:9] ~ "Summer",
    Month_abb %in% month.abb[10:12] ~ "Autumn"),
    Season = factor(Season, levels = c("Winter", "Spring", "Summer", "Autumn")))
  • Take the Yearly Seasonal Biomass sum by Station for these 2 groups and transform the data by taking the log of this biomass
plankton |> 
  dplyr::filter(Group %in% c("Phytoplankton", "Copepoda")) |> 
  dplyr::mutate(Season = case_when(
    Month_abb %in% month.abb[1:3] ~ "Winter",
    Month_abb %in% month.abb[4:6] ~ "Spring",
    Month_abb %in% month.abb[7:9] ~ "Summer",
    Month_abb %in% month.abb[10:12] ~ "Autumn"),
    Season = factor(Season, levels = c("Winter", "Spring", "Summer", "Autumn"))) |> 
  dplyr::group_by(Season, Year, Station, Group) |> 
  dplyr::summarise(Tot_Biomass_log = log(sum(Biomass))) |> 
  dplyr::ungroup()
  • Make the data wide and save it as df (and remove all the row with NA values)
df <- 
  plankton |> 
  dplyr::filter(Group %in% c("Phytoplankton", "Copepoda")) |> 
  dplyr::mutate(Season = case_when(
    Month_abb %in% month.abb[1:3] ~ "Winter",
    Month_abb %in% month.abb[4:6] ~ "Spring",
    Month_abb %in% month.abb[7:9] ~ "Summer",
    Month_abb %in% month.abb[10:12] ~ "Autumn"),
    Season = factor(Season, levels = c("Winter", "Spring", "Summer", "Autumn"))) |> 
  dplyr::group_by(Season, Year, Station, Group) |> 
  dplyr::summarise(Tot_Biomass_log = log(sum(Biomass))) |> 
  dplyr::ungroup() |> 
  tidyr::pivot_wider(names_from = Group, values_from = Tot_Biomass_log) |> 
  tidyr::drop_na()
  • Start to plot by using the geom_point
    • The shape is changing according to the Season
    • The fill is changing according to the Season
    • The size of the points are fixed to 3
ggplot(data = df,
       mapping = aes(x = Phytoplankton,
                     y = Copepoda)) +
  geom_point(mapping = aes(shape = Season,
                           fill = Season),
             size = 3)
  • Specify that the shape should be 21, 22, 23 and 24
ggplot(data = df,
       mapping = aes(x = Phytoplankton,
                     y = Copepoda)) +
  geom_point(mapping = aes(shape = Season,
                           fill = Season),
             size = 3) +
  scale_shape_manual(values = c(21, 22, 23, 24)) 
  
  • Split the plot vertically based on the Season
ggplot(data = df,
       mapping = aes(x = Phytoplankton,
                     y = Copepoda)) +
  geom_point(mapping = aes(shape = Season,
                           fill = Season),
             size = 3) +
  scale_shape_manual(values = c(21, 22, 23, 24)) +
  facet_wrap(~Station, ncol = 1)
  • Add the regression lines for each Station and Season without the standard error
ggplot(data = df,
       mapping = aes(x = Phytoplankton,
                     y = Copepoda)) +
  geom_point(mapping = aes(shape = Season,
                           fill = Season),
             size = 3) +
  scale_shape_manual(values = c(21, 22, 23, 24)) +
  facet_wrap(~Station, ncol = 1) +
  geom_smooth(mapping = aes(group = Season,
                            col = Season),
              se = FALSE,
              method = "lm")
  • Modify the labels
ggplot(data = df,
       mapping = aes(x = Phytoplankton,
                     y = Copepoda)) +
  geom_point(mapping = aes(shape = Season,
                           fill = Season),
             size = 3) +
  scale_shape_manual(values = c(21, 22, 23, 24)) +
  facet_wrap(~Station, ncol = 1) +
  geom_smooth(mapping = aes(group = Season,
                            col = Season),
              se = FALSE,
              method = "lm") +
  labs(x = "log(Phytoplankton Biomass)",
       y = "log(Copepod Biomass)",
       title = "Relationship between total phytoplankton and\ntotal copepod biomass across stations and seasons")
  • Modify the theme of the plot
ggplot(data = df,
       mapping = aes(x = Phytoplankton,
                     y = Copepoda)) +
  geom_point(mapping = aes(shape = Season,
                           fill = Season),
             size = 3) +
  scale_shape_manual(values = c(21, 22, 23, 24)) +
  facet_wrap(~Station, ncol = 1) +
  geom_smooth(mapping = aes(group = Season,
                            col = Season),
              se = FALSE,
              method = "lm") +
  labs(x = "log(Phytoplankton Biomass)",
       y = "log(Copepod Biomass)",
       title = "Relationship between total phytoplankton and\ntotal copepod biomass across stations and seasons") +
  theme_bw() +
  theme(legend.position = "bottom",
        strip.background = element_blank(),
        strip.text = element_text(hjust = 1,
                                  face = "bold"),
        panel.grid = element_blank()) +
  scale_color_manual(values = c("#ADA9B7", "#B9E28C", "#FFB84C", "#DFD3C3")) +
  scale_fill_manual(values = c("#ADA9B7", "#B9E28C", "#FFB84C", "#DFD3C3"))

Create this plot using the two datasets

  • Start with loading the phytoplankton dataset
data("phytoplankton")
  • Filter the Month Aug and Sep
phytoplankton |> 
  dplyr::filter(Month_abb %in% c("Aug", "Sep"))
  • Initiate a plot with the Month_abb on the x-axis, the Biomass on the y-axis and that is filled according to their Taxa
phytoplankton |> 
  dplyr::filter(Month_abb %in% c("Aug", "Sep")) |> 
    ggplot(mapping = aes(x = Month_abb,
                       y = Biomass,
                       fill = Taxa))
Tip

You can pipe directly your dataset into ggplot. It will know that data = what_is_above

  • Add a geom_bar

Look what position we can apply to geom_bar

phytoplankton |> 
  dplyr::filter(Month_abb %in% c("Aug", "Sep")) |> 
    ggplot(mapping = aes(x = Month_abb,
                       y = Biomass,
                       fill = Taxa)) +
  geom_bar(position = "fill",
           stat = "identity")
  • Separate the plot in facets by Station
phytoplankton |> 
  dplyr::filter(Month_abb %in% c("Aug", "Sep")) |> 
    ggplot(mapping = aes(x = Month_abb,
                       y = Biomass,
                       fill = Taxa)) +
  geom_bar(position = "fill",
           stat = "identity") +
  facet_wrap(~Station, ncol = 1) 
  • Modify the labels, fill values, theme and the y-axis and save the plot as p1
p1 <-
  phytoplankton |> 
  dplyr::filter(Month_abb %in% c("Aug", "Sep")) |>

  ggplot(mapping = aes(x = Month_abb,
                       y = Biomass,
                       fill = Taxa)) +
  geom_bar(position = "fill",
           stat = "identity") +
  facet_wrap(~Station, ncol = 1) +

  scale_fill_manual("Phytoplankton", # This change the title of the legend
                    values = c("#F5A65B", "#8BBD8B", "#8B9474", "#B388EB", "#28587B"))+
  theme_void() +
  theme(axis.text.x = element_text(color = "black"),
        strip.text = element_text(color = "black", hjust = .1),
        axis.title.y = element_text(color = "black", angle = 90)) +
  labs(y = "Relative biomass")
  • Do the same for the zooplankton
p2 <- 
  zooplankton |> 
  dplyr::filter(Month_abb %in% c("Aug", "Sep")) |>

  ggplot(mapping = aes(x = Month_abb,
                       y = Biomass,
                       fill = Taxa)) +
  geom_bar(position = "fill",
           stat = "identity") +
  facet_wrap(~Station, ncol = 1) +


  scale_fill_manual("Zooplankton",
                    values = c("#816C61", "#E7DFC6", "#E7BBE3", "#7CC6FE", "#EE6C4D", "#B6C9BB", "#9684A1", "#BFEDC1", "#B19994", "black")) +
  theme_void() +
  theme(axis.text = element_text(color ="black"),
        strip.text = element_text(color = "transparent"))
  • Combine the two plots together using the package patchwork
library(patchwork)
p1 + p2 +
  plot_layout(guides = "collect") # This collect the legend together

Using phytoplanktoncreate this plot:

This is one solution, but several other exist !

  • Rename the taxa either Cyanobacteria or Other and transform Month_abb as a factor
phytoplankton |> 
  dplyr::mutate(Taxa = ifelse(Taxa == "Cyanobacteria", "Cyanobacteria", "Other"),
                Month = factor(Month_abb, levels = month.abb))
  
  • Calculate the sum of Biomass by Month, Taxa, Station and Year
phytoplankton |> 
  dplyr::mutate(Taxa = ifelse(Taxa == "Cyanobacteria", "Cyanobacteria", "Other"),
                Month = factor(Month_abb, levels = month.abb)) |> 
  dplyr::group_by(Taxa, Station, Month, Year) |> 
  dplyr::summarise(Tot_biomass = sum(Biomass)) |> 
  dplyr::ungroup()
  • Pivot the table and remove the NA and calculate the proportion of Cyanobacteria
phytoplankton |> 
  dplyr::mutate(Taxa = ifelse(Taxa == "Cyanobacteria", "Cyanobacteria", "Other"),
                Month = factor(Month_abb, levels = month.abb)) |> 
  dplyr::group_by(Taxa, Station, Month, Year) |> 
  dplyr::summarise(Tot_biomass = sum(Biomass)) |> 
  dplyr::ungroup() |> 
  tidyr::pivot_wider(names_from = Taxa, values_from = Tot_biomass) |> 
  tidyr::drop_na() |> 
  dplyr::mutate(Total = Cyanobacteria + Other,
                Cyano_prop = (Cyanobacteria/Total) * 100) |> 
  dplyr::select(Station, Month, Year, Cyano_prop) # This is not needed but it is easier to have a simpler table
  • Initiate the plot using geom_tile and separate according to the Station
phytoplankton |> 
  dplyr::mutate(Taxa = ifelse(Taxa == "Cyanobacteria", "Cyanobacteria", "Other"),
                Month = factor(Month_abb, levels = month.abb)) |> 
  dplyr::group_by(Taxa, Station, Month, Year) |> 
  dplyr::summarise(Tot_biomass = sum(Biomass)) |> 
  dplyr::ungroup() |> 
  tidyr::pivot_wider(names_from = Taxa, values_from = Tot_biomass) |> 
  tidyr::drop_na() |> 
  dplyr::mutate(Total = Cyanobacteria + Other,
                Cyano_prop = (Cyanobacteria/Total) * 100) |> 
  dplyr::select(Station, Month, Year, Cyano_prop) |>  # This is not needed but it is easier to have a simpler table
  ggplot(mapping = aes(x = Month,
                       y = Year,
                       fill = Cyano_prop)) +
  geom_tile(col = "black") +
  facet_grid(~Station)
  • Implement some theme changes
phytoplankton |> 
  dplyr::mutate(Taxa = ifelse(Taxa == "Cyanobacteria", "Cyanobacteria", "Other"),
                Month = factor(Month_abb, levels = month.abb)) |> 
  dplyr::group_by(Taxa, Station, Month, Year) |> 
  dplyr::summarise(Tot_biomass = sum(Biomass)) |> 
  dplyr::ungroup() |> 
  tidyr::pivot_wider(names_from = Taxa, values_from = Tot_biomass) |> 
  tidyr::drop_na() |> 
  dplyr::mutate(Total = Cyanobacteria + Other,
                Cyano_prop = (Cyanobacteria/Total) * 100) |> 
  dplyr::select(Station, Month, Year, Cyano_prop) |> 
  ggplot(mapping = aes(x = Month,
                       y = Year,
                       fill = Cyano_prop)) +
  geom_tile(col = "black") +
  facet_grid(~Station) +
  coord_fixed() +
  labs(x = NULL, y = NULL)+
  theme_minimal() +
  theme(legend.position = "bottom",
        axis.text.x = element_text(angle = 90, vjust=0.5),
        panel.grid = element_blank()) +
  scale_fill_gradient2("Proportion of\nCyanobacteria", low = "#E3DFFF", high = "#DC0073", midpoint = 50)